-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1384 by refactoring VersionMap
to include all combinations of LoadType
s and whether to include_deleted
#1443
Conversation
ae70971
to
bf97d88
Compare
LOAD_TIME_FROM
and LOAD_DOWNTO
VersionMap
to include all combinations of LoadType
s and whether to include_deleted
9a3b148
to
f2cc889
Compare
5636473
to
c0e5f35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've focused on the tests in this review as AlexO has done the rest.
7ab9631
to
559f203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this other than the remark about the tombstone being added twice
cpp/arcticdb/version/version_map.hpp
Outdated
return cached_timestamp <= requested_load_strategy.load_from_time_.value(); | ||
} | ||
case LoadType::LOAD_ALL: | ||
// All versions are cached if the last time we loaded the cache we used a LOAD_ALL strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is too long, how about "Can only use cache when it was populated by a LOAD_ALL call, in which case it is only unsafe to use when the cache is of undeleted versions and the request is for all versions".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment as suggested
@@ -202,13 +201,12 @@ inline version_store::TombstoneVersionResult tombstone_version( | |||
stream_id, version_id); | |||
} | |||
// We will write a tombstone key even when the index_key is not found | |||
tombstone = version_map->write_tombstone(store, version_id, stream_id, entry, creation_ts); | |||
version_map->write_tombstone(store, version_id, stream_id, entry, creation_ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a call to write_tombstone
in delete_version_if_undeleted
which I think needs to be updated. As things stand, it will be adding the tombstone twice, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed - the function that needs changing is actually in the enterprise repo, which I had checked out while doing this review. So this is fine - we just need to tweak the enterprise repo after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Draft pull request to do this: https://github.com/man-group/arcticdb-enterprise/pull/86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved but please wait for AlexO and William to have a look too before merging
559f203
to
72365e1
Compare
8c82d76
to
c059481
Compare
c059481
to
d2b86ac
Compare
1cdb249
to
867d3be
Compare
stream_version_data.react(query_2); | ||
ASSERT_EQ(stream_version_data.count_, 2); | ||
ASSERT_EQ(stream_version_data.load_param_.load_type_, LoadType::LOAD_DOWNTO); | ||
ASSERT_EQ(stream_version_data.load_param_.load_until_version_, 4); | ||
ASSERT_EQ(stream_version_data.load_strategy_.load_type_, LoadType::LOAD_DOWNTO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lose the extra LOAD? as in LoadType::LATEST not LoadType::LOAD_LATEST? It's a bit smurf naming at the moment (not your fault) https://devcards.io/smurf-naming-convention
@@ -49,7 +49,7 @@ std::vector<AtomKey> backwards_compat_write_and_prune_previous(std::shared_ptr<S | |||
log::version().debug("Version map pruning previous versions for stream {}", key.id()); | |||
|
|||
std::vector<AtomKey> output; | |||
auto entry = version_map->check_reload(store, key.id(), LoadParameter{LoadType::LOAD_ALL}, __FUNCTION__); | |||
auto entry = version_map->check_reload(store, key.id(), LoadStrategy{LoadType::LOAD_ALL, LoadObjective::ANY}, __FUNCTION__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find a LoadObjective of ANY a bit confusing. Any what?
cpp/arcticdb/version/version_map.hpp
Outdated
if (entry.loaded_until_ <= static_cast<VersionId>(load_param.load_until_version_.value())) { | ||
bool loaded_as_far_as_version_id(const VersionMapEntry& entry, SignedVersionId requested_version_id) const { | ||
if (requested_version_id >= 0) { | ||
if (entry.loaded_with_progress_.oldest_loaded_index_version_ <= static_cast<VersionId>(requested_version_id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entry.load_progress_?
661ef67
to
a55eba5
Compare
a55eba5
to
8f59430
Compare
- We fail to exit early when following the version chain with both LOAD_DOWNTO and LOAD_FROM_TIME. Adds a cpp test to show that - Fix will be done in follow up commits
- Before this change we had to remember to update the version_map entry every time after calling write_tombstone. Now write_tombstone does it internally. - This is a small change which should not affect behavior. - Tests are not updated regarding this change because they will be revamped in following commmits.
- Unify loaded_until_ and oldest_loaded_index_version_ as they should always have the same version (because version chain is sorted) - Store a LoadProgress inside VersionMapEntry instead of a loaded_until - The LoadProgress has information about earliest loaded version and earliest loaded timestamp. This allows us to resolve the TODOs inside the has_cached_entry checks. - Now we can always determine whether LOAD_FROM_TIME needs to read from cache by comparing with the LoadProgress timestamp. - Revamps CacheInvalidation cpp tests to cover the new functionality - Add an earliest_loaded_undeleted_timestamp to make follow up refactor easier. - Makes the refactor in follow up commit easier to reason about.
- We keep an oldest_loaded_undeleted_index_version in the LoadProgress - We use it so that if we tombstone that version (either by a tombstone or tombstone_all) we invalidate the cache for undeleted versions
Introduces a new type LoadStrategy which has the load_type, the to_load enum, which indicates whether to load ANY or only UNDELETED versions and the version/time we need to search for. With this commit we mostly preserve existing behavior with a few exceptions: - The newly introduced deleted/undeleted variants of LOAD_DOWNTO and LOAD_FROM_TIME are introduced and are handled appropriately and their behavior is documented in LoadStrategy. - Abstract away the batch version map merging of LoadStrategies and fix a small bug there. - Fixes #1386 by passing include_deleted=false when looking for as_of time or version Also adds an elaborate FollowingVersionChain and FollowingVersionChainWithCaching cpp tests
- Shouldn't change behavior.
Before this change version_map's reload_interval was equal to the unsync_tolerance. This means that we would ~always miss the cache. The check for outdated cache is as follows: - When reading into cache set `last_reload_time = now - unsync_tolerance` - When checking if cache is outdated we check whether `now - last_reload_time > reload_interval This change updates the unsync tolerance to 200ms. This will mean the cache is valid for 1.8s Tests uncovered that we were not flushing the version_map when clearing the storage. This change fixes this as well.
This commit is only a rename
`iterate_on_failure` used to provide a back up way to load versions if the ref key is unreadable for some reason. The ref key should no longer be unreadable, thus `iterate_on_failure` should be removed. This commit removes the `iterate_on_failure` - argument in the `lib._nvs` functions - parameter of `VersionQuery` - parameter of `LoadParameter` and entirely removes the `LoadParameter` and replaces it with `LoadStrategy` - removes IterateOnFailure cpp test which used to test this behavior by artificially removing the ref key.
The lib._nvs api needs to be stable so we re-add the iterate_on_failure argument to not break existing usage. Instead whenever the argument is passed we issue a warning that it's deprecated.
This commit is just a rename addressing some code review comments.
8f59430
to
272c32e
Compare
Reference Issues/PRs
Fixes #1384
What does this implement or fix?
The #1384 bug also applies to
as_of=<VERSION_ID>
it is just not seen in the repro because it is not looking for an old enough version.The bug is that we perform the check on
TOMBSTONE_ALL
only forLOAD_UNDELETED
andLOAD_LATEST_UNDELETED
. HoweverLOAD_DOWNTO
andLOAD_FROM_TIME
should also only work on undeleted versions when querying withas_of
.As discussed we would like to have the ability to both load deleted and undeleted with
LOAD_DOWNTO
andLOAD_FROM_TIME
. This essentially leaves us with the full cartesian product of possible load types and whether to include deleted or not.This the majority of this change is to do the refactor to extract the
include_deleted
out of the load type and ensure correctness.The PR is split in several commits, each one of which should be reviewed separately. The commits have detailed descriptions but in a few words the commits do the following:
write_tombstone
LoadType
into aLoadStrategy
which encapsulates theLoadType
,include_deleted
and the timestamp or version forLOAD_DOWNTO
andLOAD_FROM_TIME
. This fixes the Poor performance when reading as_of a date with many early versions deleted #1384 bug and provides new functionalityfollow_version_chain
clearer.unsync_tolerance
because previously we were ~always skipping the cache and deeming it outdated. Also fixes a bug where we don't clear the cache when clearing the storage.Any other comments?
Checklist
Checklist for code changes...